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Abstract. Community structure appears to be an intrinsic property of many complex real-world networks. 
However, recent work shows that real-world networks reveal even more sophisticated modules than classical 
cohesive (link-density) communities. In particular, networks can also be naturally partitioned according to 
similar patterns of connectedness among the nodes, revealing link-pattern communities. We here propose 
a propagation based algorithm that can extract both link-density and link-pattern communities, without 
any prior knowledge of the true structure. The algorithm was first validated on different classes of synthetic 
benchmark networks with community structure, and also on random networks. We have further applied 
the algorithm to different social, information, technological and biological networks, where it indeed reveals 
meaningful (composites of) link-density and link-pattern communities. The results thus seem to imply that, 
similarly as link-density counterparts, link-pattern communities appear ubiquitous in nature and design. 



1 Introduction 

Complex real-world networks commonly reveal local co- 
hesive modules of nodes denoted (link- density) commu- 
nities pQ. These are most frequently observed as densely 
connected clusters of nodes that are only loosely connected 
between. Communities possibly play crucial roles in dif- 
ferent real- world systems 2 3 ; furthermore, community 
structure also has a strong impact on dynamic processes 
taking place on networks [415] . Thus, communities provide 
an insight into not only structural organization but also 
functional behavior of various real- world systems [3161718] . 

Consequently, analysis of community structure is cur- 
rently considered one of the most prominent areas of net- 
work science |9ll0lllj . while it has also been the focus of 
recent efforts in a wide variety of other fields. Besides pro- 
viding many significant theoretical grounds [H] , a substan- 
tial number of different community detection algorithms 



vealing link-pattern communities [23 24] . (The term was 
formulated by Long et al. [2"5].) Loosely speaking, link- 
pattern communities correspond to clusters of nodes that 
are similarly connected with the rest of the network (i.e., 
share common neighborhoods). Note that link-density com- 
munities are in fact a special case of link-pattern commu- 
nities (with some fundamental differences discussed later 
on). Thus, some of the research on the former also apply 
for the latter |13I14I28I29|3"0] . However, contrary to the 
flourish of the literature on classical communities in the 
last decade, a relatively small number of authors have con- 
sidered more general link-pattern counterparts [23124125126131132(531 
(in the same sense as in this paper 1 ). Although this could 
be attributed to a number of factors like increased com- 
plexity or lack of adequate generative models and algo- 
rithms, more importantly, existence of meaningful link- 
pattern communities has not been properly verified under 
the same framework in various different types of real- world 



has also been pro posed in th e literature |12 | 13 | 14 | 15 | 16 | 17 | 18 | ^|& &rks that are commonly analyzed in the literature 

(still, some networks have been considered in the past). 
In this paper we try to address this issue. (Note that sim- 
ilar stance was also made by Newman and Leicht |23j.) 

We extend balanced propagation [33] with defensive 
preservation of communities |20j into a general approach 
that can extract arbitrary network modules ranging from 
link-density to link-pattern communities. To the best of 
our knowledge, this is the only such algorithm that does 
not require some prior knowledge of the true structure 
(e.g., the number of communities), or does not optimize 
some heuristic selected beforehand. We have validated the 
proposed algorithm on two classes of synthetic benchmark 
networks with community structure, and also on random 
networks. The algorithm was further applied to different 



(for reviews see [10111121] ). However, most of the past 
research was focused primarily on classical communities 
characterized by higher density of edges [22]. In contrast 
to the latter, some recent work demonstrates that real- 
world networks reveal even more sophisticated communi- 
ties 23 24 25 26J that are indistinguishable under classical 
frameworks. 

Networks can also be naturally partitioned according 
to similar patterns of connectedness among the nodes, re- 

a e-mail: lovro . subeljOfri .uni-lj . si 

1 Link-pattern communities are known as blockmodels [27J 
in social networks literature. These were rigorously analyzed in 
the past, however, the main focus and employed formulation 
differs from ours. 
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Fig. 1. (Color online) Link-density and link-pattern com- 
munities (i.e., shaded regions) in (a) karate and (b) women 
social networks, respectively (Table |2j. The former rep- 
resents social interactions among the members of karate 
club observed by Zachary [39], while the latter shows so- 
cial events (right-hand side) visited by women (left-hand 
side) in Natchez, Mississippi [40 . Link-density commu- 
nities correspond to cohesive modules of nodes, whereas 
link-pattern communities represent common patterns of 
connectedness among the nodes. 

social, information, technological and biological networks, 
where it reveals meaningful composites of link-density and 
link-pattern communities that are well supported by the 
network topology. The results thus seem to imply that, 
similarly as link-density counterparts, link-pattern com- 
munities appear ubiquitous in nature and technology. 

The rest of the paper is structured as follows. In Sec- 
tion [2] we discuss the relation between link-density and 
link-pattern communities in greater detail, and propose a 
propagation based algorithm for their detection. Results 
on synthetic and real-world networks are presented and 
formally discussed in Section[3j while in Section[4]we sum- 
marize our main observations and discuss some prominent 
directions for future research. 



2 Link-density and link-pattern communities 

Although classical link-density communities can be con- 
sidered under the same framework as link-pattern com- 
munities, there exist some significant differences between 
the two (Figure [l]) . Most obviously, link-pattern commu- 
nities do not correspond to cohesive modules of nodes, 
whereas, such communities commonly do not even feature 
connectedness. Connectedness is considered a fundamen- 
tal structural property of link-density communities, and 
thus a common ingredient of different objective functions 
and community detection algorithms [10] . 

While link-density communities are often related to the 
notions of assortativc mixing by degree and homophily |1I41) 
(at least in social networks) , link-pattern communities might 
in fact represent the origin of more commonly observed 
disassortative degree mixing |42I43] . As the latter has been 
analyzed to much lesser extent than the former [23] , direct 
dependence has not yet been verified in real-world net- 
works. Nevertheless, disassortative mixing refers merely 
to the phenomena that nodes mainly connect to dissimilar 



nodes, and thus outside of their respective (link-pattern) 
community. However, how such communities relate be- 
tween each other, and with the rest of the network, re- 
mains unexplained. (Although network assortativity was 
most commonly analyzed in the context of node degree [42141] , 
we refer to the notion in general.) 

Note also that, as nodes of some link-pattern commu- 
nity are commonly not directly connected, they exhibit 
somewhat higher mutual independence than nodes within 
some link-density community. On the contrary, nodes from 
neighboring link-pattern communities are somewhat more 
dependent than in the case of classical communities. 

Due to all above, we strictly distinguish between link- 
density and link-pattern communities within the proposed 
algorithm. However, it ought to be mentioned that this 
is rather different from other authors, who have typically 
considered all communities under link-pattern regime [23 24 25 26131 
Nevertheless, the latter could be attributed to the fact 
that other approaches are mainly based on previous work 
in social sciences, statistics or artificial intelligence, where 
such setting might be more adequate. 

In Section |2.1| we first introduce a balanced propaga- 
tion based algorithm for classical community detection; 
while the algorithm is extended for general community 
detection in Section [2~2l 

2.1 Classical community detection 

Let the network be represented by an undirected and un- 
weighted multi-graph G(N,E), with N being the set of 
nodes of the graph and E being the set of edges. Further- 
more, let c n be the community (label) of node n, n G N, 
and Af(n) the set of its neighbors. 

Algorithms presented below are in fact based on a label 
propagation proposed by Raghavan et al. [16] . The label 
propagation algorithm (LPA) [TH] extracts (link-density) 
communities by exploiting the following simple procedure. 
At first, each node is labeled with a unique label, c n — l n . 
Then, at each iteration, each node adopts the label shared 
by most of its neighbors. Hence, 

= argmax | Af l (n) \ , ( 1 ) 

where Af l (n) is the set of neighbors of n that share la- 
bel I (ties are broken uniformly at random). To prevent 
oscillations of labels, node n retains its current label when 
it is among most frequent in J\f(n) 16J. Due to existence 
of many intra-community edges, relative to the number 
of inter-community edges, nodes in a (link-density) com- 
munity form a consensus on some particular label after 
a few iterations. Thus, when an equilibrium is reached, 
disconnected groups of nodes sharing the same label are 
classified into the same community. 

Due to extremely fast structural inference of label prop- 
agation, the algorithm exhibits near linear time complex- 
ity [16120] (in the number of edges) and can easily scale 
to networks with millions (or even billions) of nodes and 
edges [20144] . Also, due to its algorithmic simplicity, it is 
currently among more commonly adopted algorithms in 
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the literature. Still, label propagation can be further im- 
proved in various ways [45 46 47 48 38 2U] . 

In the following we present two advances of the basic 
approach that improve on its robustness and community 
detection strength. Both result in a simple incorporation 
of node preferences p n [IS] into the updating rule of label 
propagation as 



argmax 

m£A/" ! (ra) 



Pn 



(2) 



(See equation Q.) Node preferences adjust the propaga- 
tion strength of each respective node, and can thus di- 
rect the propagation process towards more desirable par- 
titions 46 20 J. Note that preferences p n can be set to an 
arbitrary node statistic (e.g., degree [46]). 

To address issues with oscillations of labels in some 
networks (e.g., bipartite networks), nodes' labels are up- 
dated in a random order [16 (independently among it- 
erations). Although this solves the aforementioned prob- 
lem, the introduction of randomness severely hampers the 
robustness of the algorithm, and consequently also the 
stability of the identified community structure. Different 
authors have noted that label propagation reveals a large 
number of different community structures even in smaller 
networks, while these structures are also relatively differ- 
ent among themselves [491120] . 

We have previously shown that updating nodes in some 
particular order results in higher propagation strength for 
nodes that are updated at the beginning, and lower prop- 
agation strength for nodes that are updated towards the 
end j3S] . The order of node updates thus governs the algo- 
rithm in a similar manner than corresponding node propa- 
gation preferences. Based on the latter, we have proposed 
a balanced propagation algorithm 38J that utilizes node 
preferences to counteract (i.e., balance) the randomness 
introduced by random update orders. In particular, we 
introduce the notion of node balancers that are set to the 
reverse order in which the nodes are assessed. Thus, lower 
and higher propagation strength is assigned to nodes con- 
sidered first and last, respectively. 

Let nodes N be ordered in some random way, and let 
i n denote the normalized position of node n in this order, 
in G (0, 1]. Hence, 



index of node n 



(3) 



Node balancers b n can then be modeled with a simple 
linear function as b n = i n . However, using a logistic curve 
allows for some further control over the algorithm. Thus, 



K. = 



1 

1 + exp(-/3(i„ - a)) ' 



(4) 



where a and /3 are parameters of the algorithm. Intu- 
itively, we fix a to 0.5, while stability parameter /3 is set 
to 0.25 according to some preliminary experiments [38] 
(see below) . Note that balancers b n are re-estimated before 
each iteration, and are incorporated into the algorithm as 
node propagation preferences (see equation (16J)). 



Setting the stability parameter (3 to yields the ba- 
sic label propagation approach, while increasing /? signifi- 
cantly improves the robustness of the algorithm. However, 
computational complexity thus also increases. Hence, bal- 
anced propagation improves the stability of the identified 
community structure for the sake of higher complexity, 
while the trade-off is in fact governed by the parameter 
/3. Note that community detection strength of the refined 
algorithm is also improved in most cases. For a more de- 
tailed discussion see [3"5] . 

To even further improve the performance of the algo- 
rithm we also adopt defensive preservation of communi- 
ties [2D]. The strategy increases the propagation strength 
from the core of each currently forming community, which 
results in an immense ability of detecting communities, 
even when they are only weakly depicted in the network's 
topology. Laying the pressure from the borders also pre- 
vents a single community from occupying a large portion 
of the network, which else occurs in, e.g., information net- 
works [46]. Thus, the strategy defensively preserves com- 
munities and forces the algorithm to more gradually reveal 
the final structure. For further discussion see [20144] . 

In the algorithm, community cores are estimated by 
means of the diffusion over the network. The latter is 
modeled by employing a random walker within each com- 
munity. Let d n be the probability that a random walker 
utilized on community c n visits node n. Then, 



d n = 



meAf c rz (n) 



(5) 



where k!f£ is the intra-community degree of node m. Be- 
sides deriving the estimates of cores and borders, the main 
objection here is to mimic label propagation within each 
community, to estimate the current state of the propaga- 
tion process, and then to adequately alter its dynamics 
(see equation ([6])). Note that values d n are re-estimated 
according to equation |5]) when the corresponding node 
updates its label (initially, all d n are set to l/|iV|). 

Similarly as above, diffusion values d n are incorporated 
into the algorithm as node propagation preferences. Thus, 
the updating rule for balanced propagation algorithm with 
defensive preservation of communities is 



= argmax b m d n 



(6) 



The above is taken as a basis for a general commu- 
nity detection algorithm presented in the following sec- 
tion. Note that the formulation can be extended to weighted 
networks in a straightforward fashion. 



2.2 General community detection 

Label propagation algorithm (and its advances) cannot 
be directly adopted for detection of link-pattern commu- 
nities, as the bare nature of label propagation demands 
cohesive (connected) clusters of nodes (Section [2]). How- 
ever, link-pattern communities can still be seen as cohesive 
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modules when one considers second order neighborhoods 
(i.e., nodes at distance 2). Thus, instead of propagating la- 
bels between the neighboring nodes, the labels are rather 
propagated through node's neighbors (i.e., between nodes 
at distance 2). For instance, when a group of nodes ex- 
hibits similar pattern of connectedness with other nodes, 
propagating labels through these latter nodes would in- 
deed reveal the respective link-pattern community (simi- 
larly as for classical label propagation). 

Considering the above, balanced propagation based al- 
gorithm presented in Section [2. 1| can be extended for link- 
pattern communities in a rather ad hoc fashion. Let Si be 
a community dependent parameter, Si € [0, 1], such that 
Si ss 1 and <5; sa for link-density and link-pattern com- 
munities, respectively. Thus, when Si varies from 1 to 0, 
communities range from classical link-density communi- 
ties to proper link-pattern communities. Balanced propa- 
gation in equation ^ can then be simply advanced into 
a general community detection algorithm as 



Si according to 



i-*(o = p E 



(9) 



nEN 1 



where k l is the strength of community /, k — J2 n eN< 
(initially all Si are set to 0.5). As the strategy adjusts 
values of Si with respect to each individual community, the 
algorithm more accurately reveals different composites of 
link-density and link-pattern communities (Section [3]). 

For networks with clear assortative or disassortative 
mixing, values Si can in fact be more accurately estimated 
on the level of the entire network (Section [3]). Hence, 

I IV 1 1 

^E^vf 1 -^))' do) 



while the resulting algorithm is denoted GPAn. 

All proposed algorithms have complexity near 0(k\E\), 
where k is the average degree in the network. 



argmax 



+ (1 - *i) 



m£Af l (n) 



b rn d r . 



(7) 



E 



m£j\f l (s)\s£j\f(n) 

where similarly as in equation ([51, diffusion values d' n are 
estimated using random walks. Hence, 



d' = 



E 



(8) 



(Denominators in equations Q , ^ provide that the sums 
are proportional to the degree ofthe node k n .) Else, the 
proposed algorithm is identical as before, and is denoted 
general propagation algorithm (GPA). Note that setting 
all Si to 1 yields the classical community detection algo- 
rithm in equation 

Due to simplicity, in GPA all Si are fixed to 0.5. Still, 
the algorithm can detect either link-density or link-pattern 
communities, or different mixtures of both, when they are 
clearly depicted in the network's topology (Section 3j). 
However, the algorithm can also detect communities that 
are of neither link-density nor link-pattern type. 

As our main intention is to unfold meaningful com- 
posites of mainly link-density and link-pattern communi- 
ties, we also propose a variant of the algorithm denoted 
GPAo The latter algorithm re-estimates the values Si on 
each iteration, in order to reveal clearer community struc- 
ture. In particular, we measure the quality of each commu- 
nity using the conductance |50j . to determine whether 
the community better conforms with link-density or link- 
pattern regime. (The conductance measures the goodness 
of a link-density community, or equivalently, the quality 
of the corresponding network cut.) As good link-density 
communities exhibit low values of conductance, and good 
link-pattern communities exhibit high values, after each 
iteration of the algorithm (though omitted on first) we set 



3 Results and discussion 

In the following sections we analyze the proposed algo- 
rith ms o n different s ynth etic and real-world networks (Sec- 
tion 



3.1 and Section 3.2 respectively) 



General propagation algorithms (i.e., GPA, GPAc and 
GPAn) are compared against two other approaches. As 
a representative of classical community detection algo- 
rithms, we employ basic label propagation (i.e., LPA). 
Next, we also adopt the mixture model with expectation- 
maximization |51) proposed by Newman and Leicht |23| 
(denoted MMem)- Their algorithm can detect arbitrary 
network modules and is currently among state-of-the-art 
approaches for detection of link-pattern communities |23l25j . 
Still, it demands the number of communities to be known 
beforehand. Note that the exact number of communities 
(currently) cannot be adequately estimated in large real- 
world networks |52j . Due to simplicity, we limit the num- 
ber of iterations to 100 for all the algorithms. 

The results are assessed in terms of normalized mu- 
tual information (NMI) [9], which has become a de facto 
standard in community detection literature. Let C be a 
partition revealed by the algorithm and let V be the true 
partition of the network (corresponding random variables 
are C and P, respectively). NMI of C and V is then 



NMI = 



2I(C,P) 
H(C) + H(P) ' 



(11) 



where I(C, P) is the mutual information of the partitions, 
i.e., I(C,P) = H{C) - H(C\P), and H{C), H(P) and 
H(C\P) are standard and conditional entropies. NMI of 
identical partitions equals 1, and is for independent ones. 

3.1 Synthetic networks 

The algorithms were first applied to synthetic benchmark 
networks with two communities of 32 nodes. Average de- 
gree is fixed to 6, while the community structure is con- 
trolled by a mixing parameter [0, 1]. When \x equals 
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Fig. 2. (Color online) Mean NMI over 1000 realizations 
of synthetic networks with two communities. Error bars 
showing standard error of the mean are smaller than the 
symbol sizes. 



0, all edges are (randomly) placed between the nodes of 
the same community, and when /i equals 1, all edges are 
(randomly) placed between the nodes of different commu- 
nities. Thus, when /i varies from to 1, community struc- 
ture ranges between link-density and link-pattern regime 
(i.e., assortative and disassortative mixing). Note that net- 
work structure is completely random for /i = 0.5. 

The results appear in Figure [2j As anticipated, classi- 
cal community detection algorithm LPA is unable to dis- 
tinguish between a network with disassortative mixing and 
a completely random network (i.e., fi ss 1 and fi 0.5, 
respectively). Moreover, LPA also has the worst perfor- 
mance for all community regimes. On the other hand, 
mixture model MMem performs significantly better than 
other algorithms, especially in the case of link-pattern 
communities (i.e., p, > 0.5). However, we argue that this 
is largely due to the fact that the algorithm is given the 
true number of communities in advance. 

Observe that general propagation algorithms GPA and 
GPAn can indeed detect both link-density and link-pattern 
communities. However, the algorithm with a network-wise 
re-estimation of Si performs slightly better, except when 
the structure results in clear link-density communities (i.e., 
[i < 0.1). Still, the analysis on real- world networks in Sec- 
tion [372] confirms that GPAn more accurately reveals dif- 
ferent types of communities (including link-density). 

We further apply the algorithms to a class of bench- 
mark networks also adopted in [35] . The latter is in fact a 
generalization of the benchmark proposed by Girvan and New- 
man [1] for classical community detection. More precisely, 
networks comprise four communities of 32 nodes, thus, 
two communities correspond to classical link-density mod- 
ules, while the other two form a bipartite structure of 
link-pattern communities. The networks are thus neither 
assortative nor disassortative (but locally assortative or 
disassortative). Average degree is fixed to 16, while the 
community structure is again controlled by a mixing pa- 
rameter fi, /1 £ [0,1]. Lower values correspond to clearer 
community structure — when [i = 0.5, one half of the edges 
is set according to the designed structure, while the other 
are placed at random (on average). 



The results in Figure [3] also report the performance of 
LPA, although a classical community detection algorithm 
is obviously not suited for these networks. However, one 
can thus observe that, when community structure is rather 
clearly defined (i.e., fi < 0.25), only a small improvement 
can be achieved with a general community detection algo- 
rithm (on these networks). Therefore, to more accurately 
estimate the performance of GPA and GPA q , we increase 
the value of parameter j3 to 4 (Section 2.1). This further 
stabilizes the community structure identified by the algo- 
rithms, however, the computational time thus increases. 

Mixture model MMem performs significantly better 
than other algorithms, which could be attributed to a 
known number of communities as above. Otherwise, gen- 
eral propagation algorithms GPA and GPAc both detect 
link-density and link-pattern communities within these 
networks, however, only until communities are clearly de- 
picted in the networks' topologies (i.e., fi < 0.25). When 
\x further increases, the algorithm with a cluster-wise re- 
estimation of <5; still manages to reveal (link-density) com- 
munities to some extent, whereas, GPA already fails. 

Considering also the results reported in image 
graph approach of Pinkert et al. [25 . performs even slightly 
better than MMem, while the model selection of Ros- 
vall and Bergstrom [2] is a bit worse than GPAo Thus, 
we conclude that general propagation algorithms can in- 
deed reveal link-density and link-pattern communities un- 
der the same framework, still, the accuracy on these net- 
works is worse with respect to some other state-of-the- 
art approaches. However, all these approaches demand the 
number of communities to be given apriori, thus, the algo- 
rithms are actually not fully comparable. Moreover, anal- 
ysis on real- world networks in Section |3.2| reveals that, 
when the number of communities increases, the above ad- 
vantage is in fact rendered useless. 

Note that the above benchmark networks represent a 
relatively poor descr iption of real-world network struc- 
ture (see Section 3.2 1. However, construction of networks 
with both assortative and disassortative mixing is not 
straightforward, as one inevitably has to define how link- 
pattern communities connect with the rest of the network. 
Still, generalization of hierarchical network model 
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Fig. 3. (Color online) Mean NMI over 100 realizations 
of synthetic networks with four communities. Error bars 
show standard error of the mean. 
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Table 1. Mean NMI over 10000 and 1000 runs for karate, women and football, corporate networks, respectively. 



Network 


Communities 


LPA 


CPA 


GPA N 


GPA C 


MMem 


karate 


2 


0.6501 


0.6992 


0.7625 


0.7547 


0.7806 


football 


12 


0.8908 


0.8464 


0.8570 


0.8493 


0.8069 


women 


4 




0.7663 


0.7680 


0.7675 


0.8337 


corporate 


8(9) 




0.6680 


0.6735 


0.6651 


0.5995 



appears as the most prominent formulation of different 
community regimes. Here, probabilities assigned to nodes 
of a predefined hierarchy of communities dictate the con- 
nections between the nodes in the network. High proba- 
bilities at the bottom level of the hierarchy yield classical 
cohesive modules, whereas link-pattern communities are 
characterized by higher probabilities at one level above. 

To further validate the proposition, we have also ap- 
plied the propagation algorithms to a random graph a la 
Erdos-Renyi |54) that (presumably) has no community 
structure. The number of nodes is fixed to 256, while we 
vary the average degree k between 2 and 64. When k ex- 
ceeds a certain threshold, all algorithms reveal only triv- 
ial communities (i.e., connected components of the net- 
work). The transition occurs at k s=s 8, k w 10 and k sa 
12 for LPA, GPA and GPA C , and GPA N , respectively. 
Hence, community structures revealed by general propaga- 
tion algorithms are beyond simple random configurations, 
while the algorithms are also not attributed to resolution 
limit |55j issues (i.e., existence of an intrinsic scale, below 
which communities are no longer recognized). 

3.2 Real-world networks 

The proposed algorithms were further applied to ten real- 
world networks with community structure (Table [2]) . All 
these networks are commonly analyzed in the community 
detection literature and include different social, techno- 
logical, information and biological networks (detailed de- 
scription is omitted). Due to simplicity, all networks are 
treated as unweighted and undirected. Furthermore, cor- 
porate, jung and javax networks are reduced to largest 
connected components and treated as simple graphs. 

We first consider four well known social networks, name- 
ly, karate, football, women and corporate networks. The 
former two represent classical benchmarks for link-density 
community detection, as they reveal clear assortative mix- 
ing (Figure [l] (a)). On the other hand, the latter two are 
in fact bipartite networks, thus, the respective network 
communities can be considered of pure link-pattern type 
(Figure [l] (b)). However, the networks are not properly 
disassortative, due to different types of nodes. 

All these networks have known sociological partitions 
into communities that result from earlier studies, while 
partition of corporate network is limited to only 86 cor- 
porate nodes. Comparison between community structures 
extracted by different algorithms and known network struc- 
tures can be seen in Table [TJ The number of communities 
in MMem algorithm is set to the true value for all net- 
works except corporate, where we set it to nine (Table [l}. 



Although the mixture model MMem performs better 
than general propagation algorithms on synthetic bench- 
mark networks (Section 3.1), the latter appears to be de- 
pendent on the number of communities. When the num- 
ber of communities, and thus the size of the network, is 
relatively small (i.e., karate and women networks), the 
MMem most accurately reveals the true network struc- 
ture. However, when the number of communities increases 
(i.e., football and corporate networks), all propagation al- 
gorithms significantly outperform MMem- The latter can 
be related to previously discussed weakness of MMem • 

Note that somewhat lower performance of propaga- 
tion algorithms on karate and women networks is actually 
due to the fact that the algorithms reveal three commu- 
nities in these networks, which does not coincide with the 
sociological partitioning of the nodes. In particular, the 
algorithms extract a small module from the larger com- 
munity in karate network (Figure [I] (a)), and merge the 
two communities representing events in women network 
(Figure [l] (b)). However, similarly as in the case of soci- 
ological communities, both these structures are well sup- 
ported by the networks' topologies and thus commonly 
reported by community detection algorithms in the liter- 
ature. Considering the partition of women network with 
three communities, GPA, GPAn and GPAc reveal struc- 
tures with NMI equal to 0.8769, 0.8809 and 0.8799, re- 
spectively, while MM E m obtains only 0.8027 (on average). 

General propagation algorithms with re-estimation of 
5i, i.e., GPAc and GPAn, mostly outperform the basic 
GPA. As the algorithms adopt to either assortative or dis- 
assortative mixing regime in each network, they manage 
to extract the true communities more accurately. Observe 
also that network-wise re-estimation is somewhat more ad- 



Table 2. Real-world networks with community structure. 



Network 


Description 


Nodes 


Edges 


karate 


Zachary's karate club. [39] 


34 


78 


football 


Amer. football league. pQ 


115 


616 


women 


Davis's south, women. [40] 


18, 14 


89 


corporate 


Scottish corporates. [SH] 


131, 86 


348 


jung 


JUNG graph library. [57] 


305 


710 


javax 


Java library (javax). [57] 


705 


3313 


amazon 


Amazon web graph. [58] 


2879 


5037 


protein 


S. cerevisiae proteins. [3] 


2445 


6265 


gnutella 


Gnutella peer-to-peer. [55] 


62586 


147892 


condmatt 


Cond. Matt, archive. [5U] 


36458 


171736 
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o o 

Fig. 4. (Color online) Community structures of (a) jung and (b) javax technological networks revealed with GPAq. 
Node sizes are proportional to the community sizes, while the symbols (colors) correspond to the values of Si (equa- 
tion M 



equate for these networks than a cluster-wise version, due 
to a clear mixing regime. However, for networks with both 
types of mixing, GPAq should obviously be employed. 

We conclude that general propagation can reveal link- 
density and link-pattern communities in real-world net- 
works. Thus, exactly the same algorithm is suitable for 
classical community detection in unipartite networks and 
link-pattern community detection in multi-partite networks. 
With respect to high values of NMI in Table [I] (except for 
corporate network), the proposed algorithms can also be 
considered as relatively accurate. 

As the above social networks are particularly homoge- 
neous, they reveal either assortative or disassortative mix- 
ing. However, social networks could indeed comprise both 
regimes, still, such networks would have to be heteroge- 
neous by nature (i.e., convey different types of relations 
between individuals). In fact, heterogeneity seems to be a 
necessary condition for a network to reveal different com- 
posites of link-density and link-pattern communities. In 
the following we analyze four of the remaining networks 
in Table [2] that are all heterogeneous by nature. 

Our main intention in the following is to reveal mean- 
ingful composites of not only link-density but also link- 
pattern communities, and thus imply that such structures 
could appear ubiquitous in various complex networks. There- 
fore, we apply GPAc to each network 10 times, and report 
the structure with the highest fraction of nodes within 
link-pattern communities. It should be noted that commu- 
nity structures of these networks should not be considered 
identified, as networks possibly reveal a large number of 
different structures that are all significant and well sup- 
ported by their topologies [S3] (e.g., communities exist on 
different scales). Note that multiple structures could also 
imply that no clear one exists (e.g., overlapping commu- 
nities 3J). However, general propagation algorithms find 
no communities in random networks (Section |3.1[ ), thus, 
all revealed structures are at least beyond random. 

First, we analyze two technological networks, namely, 
jung and javax networks (Table [2| . These are class de- 
pendency networks, where nodes correspond to software 
classes and edges represent different types of dependen- 



cies among them (e.g., inheritance, parameters, variables 
etc.). The networks are thus obviously heterogeneous and 
should comprise different types of communities [5 7) . 

Revealed community structures are shown in Figure [4] 
Observe that networks convey both clear link-density and 
link-pattern communities, whereas, the latter are further 
combined in rather complex configurations (i.e., shaded 
regions in Figure 4]). In particular, besides simple bipar- 
tite structures and isolated link-pattern communities, net- 
works also reveal connected clusters of multiple link-pattern 
communities. Note that, although link-pattern communi- 
ties are mainly connected between themselves, they can 
also be strongly connected with else cohesive modules of 
nodes. Moreover, both link-density and link-patter com- 
munities can reside in either network interior or periphery. 

We next analyze the main communities in greater de- 
tail (Table [3]). The core, i.e., major link-density commu- 
nity, of jung network (Figure |4j (a)) consists of only visu- 
alization classes, while these are else almost inexistent in 
other communities. As one could anticipate, the commu- 
nity is highly cohesive and independent from the rest of 
the network. Two link-pattern communities on the right- 
hand side contain utility classes for GraphML format; 
while the upper community mainly contains different par- 
sers, the lower mostly consists of meta-data classes, used 
by the former. Thus, the number of inter-community edges 
is obviously high. Central configuration of five link-pattern 
communities also contains well defined modules with par- 
ticularly clear functional roles. More precisely, communi- 
ties contain basic graph classes, interfaces for various algo- 
rithms, their implementations, different layout classes and 
filters, respectively. The strength of connections among 
the communities further supports this functional differen- 
tiation (e.g., implementations of different algorithms are 
strongly dependent on various interfaces and graph classes) . 

Similarly clear communities are also revealed in javax 
network (Figure|4j (b)). The core of the network consists of 
look-and-feel classes for different GUI components. Note 
that the majority of classes differ only in a small part of 
their name, which indicates the respective GUI component 
and look-and-feel implementation. In contrast to before, 
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Table 3. Analysis of community structures revealed in technological networks (Figure |4j). 'core' denotes the largest 
link-density community, while 'fc-configuration'-s represent shaded regions in Figure [3] (fc is the number of link-pattern 
communities). 



Network 


Community / 


|JV ! | 


Si 


Description 




core 


65 


0.86 


[jung. visualization. ] * (Server I Viewer I Pane I Model I Context) (9); cont- 
rol . * (4); control . *Control (5); layout . * (7); picking . *State (3); pick- 
ing. *Support (6); Tenderers . *Renderer (13); Tenderers . *Support (3); etc. 




5-conf. (upper left) 


3 


0.00 


[jung. algorithms . filters . ] *Filter (3). 




5-conf. (upper right) 


21 


0.33 


[jung. graph.] * (Graph I Multigraph I Tree) (18); etc. 




5-conf. (central) 


lb 


U.U / 


[jung.] algorithms . generators . *Generator (2); algorithms . importance . 
* (4); algorithms . layout . *Lay out* (3); algorithms . scoring. *Scorer (2); 


17/77/7 








algorithms . shortestpath. * (2); graph. * (Graph I Tree I Forest) (4); etc. 








(interfaces) 




5-conf. (lower left) 


13 


0.00 


[jung. algorithms .] layout . *Layout* (7); layout3d . *Layout (3); etc. 




5-conf. (lower right) 


44 


0.03 


[jung.] algorithms . cluster . *Clusterer* (4); algorithms . generators . 
random. *Generator (5); algorithms . importance . *Betweenness* (3); alg- 
orithms .metrics . * (3); algorithms . scoring. ** (5); algorithms . short- 
estpath.* (5); graph. util.* (7); etc. (implementations) 




2-conf. (upper) 


13 


0.03 


[jung. io .graphml .] parser . *Parser (10); etc. 




2-conf. (lower) 


13 


0.38 


[jung. io .graphml .] *Metadata (8); etc. 




1-conf. (central) 


2 


0.00 


[jung. visualization, control . ] *Plugin (2). 




core 


179 


0.64 


[javax. swing.] plaf . *UI (24); plaf . basic. Basic*UI (42); plaf .metal . Me- 
tal *UI (22); plaf . mult i. Mult i*UT (30); plaf . synth. Synth*UI (40); etc. 




3-conf. (upper) 


193 


0.15 


[javax.] accessibility . Accessible* (10); swing. J* (41); swing. ** (Bor- 


javax 








der I Borders I Box I Button I Dialog I Divider I Editor I Factory I Filter I Icon 








1 Kit I LookAndFeel I Listener I Model I Pane I Panel I Popup I Renderer lUIRes- 
ourcelView) (92); etc. 




3-conf. (left) 


113 


0.11 


[javax.] accessibility . Accessible* (6); swing. * (34); swing. event . *Ev- 
ent (8); swing, event . *Listener (13); swing. plaf . *UI (6); etc. 




3-conf. (lower) 


44 


0.19 


[ j avax . swing . ] text.*View (15); text .html . *View (16); etc. 



the community is not highly cohesive, as these classes are 
extensively used by, e.g., various GUI components. The 
latter in fact appear within the largest link-pattern com- 
munity, which is thus strongly dependent on the former. 
Note also that the latter link-pattern community consists 
of almost all GUI components of Java, although they re- 
side in various packages and their names (i.e., functions) 
differ substantially. For more details on community struc- 
tures of both technological networks see Table [3] 

Despite mostly qualitative analysis, general propaga- 
tion algorithms indeed reveal significant community struc- 
tures within these technological networks, while the com- 
munities can also be related to particularly clear func- 
tional roles. Obviously, the latter could not be detected 
under the classical framework of merely cohesive modules. 
Note also that the proposed algorithms do not only par- 
tition the underlying software systems, as in the case of 
classical community detection, but also reveal important 
dependencies among different subsystems that would oth- 
erwise remain concealed. It ought to be mentioned that we 
have previously conjectured the existence of other modules 
besides classical communities in software networks [57]. 

Next, we analyze the community structure of ama- 
zon information network that represents a small sample 
of Amazon web graph (Table [2]). The revealed network 



structure can be seen in Figure [5] Due to the size of the 
network and the nature of the domain, an exact analysis 
of extracted communities could not be conducted. Still, in 
the following, we discuss the main properties and highlight 
some interesting observations. 

A large number of nodes is classified into dense core of 
the network (1381 nodes), however, the algorithm also re- 
veals five well defined communities in the periphery (with 
300 nodes on average). Thus, as one could anticipate, the 
extracted partition rather accurately coincides with the 
core-periphery structure [7J that is commonly found in in- 
formation networks |61I7] . For reference, value of Si for the 
core equals 0.86, and is for the only link-pattern commu- 
nity. Communities in periphery exhibit 0.86 on average. 

We have analyzed the link-pattern community in great- 
er detail and observed that the majority of its nodes corre- 
spond to web pages on musical instruments 2 sold on Ama- 
zon. In particular, 231 of 288 nodes represent web pages on 
various instruments, while each page corresponds to a dif- 
ferent brand (e.g., Yamaha). What makes the community 
particularly significant is the fact that only one of other 
2591 nodes in the network also represents a web page on 
musical instruments (the latter is the node connected to all 

2 This can be determined by the occurrence of '11091801' 
within the URL of the respective web page. 
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Fig. 5. (Color online) Community structure of amazon 
information network revealed with GPAc . Edge directions 
were not considered by the algorithm. 



munities, while, on the contrary, almost 85 percent of the 
nodes in condmatt network reside in 2100 classical link- 
density modules (Si equals 0.30 and 0.64 on average, re- 
spectively). Figure [7] shows also cumulative community 
size distributions for both networks. Although distribu- 
tion for condmatt network appears to be power-law for the 
most part, as commonly observed in classical community 
detection |22I3) . the latter does not hold for gnutella net- 
work. In particular, communities most distinctively exists 
on two scales with tens and hundreds of nodes, which pro- 
vides some evidence that link-pattern communities might 
reflect in disassortative mixing by degree (Section [2]) . 



4 Conclusions 



nodes in the respective community). Hence, the algorithm 
manages to extract a meaningful link-pattern community 
from the core of the network, while the community is not 
only exhaustive but also rather clear. 

Observe that link-density communities generally more 
strongly connect towards the core of the network, whereas, 
in the case of link-pattern community, the connection is 
significantly stronger in the direction from the core. As 
the network was treated as undirected, the latter can- 
not be considered as an artifact of the algorithm. The 
revealed pattern could in this context imply that nodes in 
link-pattern communities provide important content (i.e., 
authority nodes [55]), while hub nodes [53] reside mainly 
in link-destiny communities. Again, the occurrence of dif- 
ferent types of communities can be related to a form of 
network heterogeneity (i.e., edge directions). 

For a complete analysis, we also apply the algorithm to 
an example of a biological network (that is also heteroge- 
neous by definition). In particular, we analyze protein net- 
work that represents protein-protein interactions of yeast 



Saccharomyces cerevisiae (Table 
nity structure appears in Figure 
tion of communities is omitted 



2 1 . The revealed commu- 
6 while detailed descrip- 
Observe that the algo- 



rithm reveals a large number of clear link-density and link- 
pattern communities of various sizes (171 communities of 
2 to 127 nodes), while both exist in the interior and the 
periphery of the network. Different types of communities 
are combined in complex configurations (shaded regions 
in Figure [6]), which, as in the examples above, suggests 
that link-pattern communities, similarly as link-density 
counterparts, are ubiquitous in real-world networks. 

Last, we also analyze community structures revealed 
with GPAc in two remaining networks in Table [2] More 
precisely, we consider gnutella information network of peer- 
to-peer communications within Gnutella file sharing, and 
condmatt social network representing scientific author col- 
laborations extracted from Condensed Matter archive. While 
the former can be characterized by a unique dissassor- 
tative behavior, the latter is in fact a prominent exam- 
ple of assortative mixing, and thus classical community 
structure. Indeed, more than 92 percent of the nodes in 
gnutella network are classified into 2670 link-pattern com- 



The paper proposes a balanced propagation based algo- 
rithm for detection of arbitrary network modules, rang- 
ing from classical cohesive (link-density) communities to 
more general link-pattern communities. The proposed al- 
gorithm was first validated on synthetic benchmark net- 
works with community structure, and also on random net- 
works. It was then further applied to different social, tech- 
nological, information and biological networks, where it 
indeed reveals significant (composites of) link-density and 
link-pattern communities. In the case of larger real-world 
networks, the proposed algorithm more accurately detects 
the true communities than a state-of-the-art algorithm, 
while, in contrast to other approaches proposed in the lit- 
erature, it does not require some prior knowledge of the 
true network structure. The latter is in fact crucial for the 
analysis of large real- world networks [52{. 

Heterogeneity appears to be a necessary condition for 
the network to reveal both link-density and link-pattern 
communities. However, although often not apparent at 
first sight, most real- world networks are in fact hetero- 
geneous by nature. Qualitative results on real-world net- 
works further imply that link-pattern communities, sim- 
ilarly as link-density counterparts, appear ubiquitous in 
nature and technology. Moreover, link-pattern communi- 
ties are also commonly combined with classical modules 




Fig. 6. (Color online) Community structure of protein 
biological network revealed with GPAc. 
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Fig. 7. (Color online) Cumulative size distributions of 
community structures revealed with GPAc in gnutella in- 
formation and condmatt social networks. 

into complex configurations, thus, different types of com- 
munities should not be analyzed independently. A genera- 
tive model or measure for a general community structure 
of real- world networks would be of great benefit. It ought 
to be mentioned that the existence of link-pattern commu- 
nities in real-world networks has implications in numerous 
other fields of network science (e.g., dynamic processes). 

The analysis in the paper does not directly imply which 
common properties of real-world networks one can expect 
under link-density or link-pattern regime. However, fur- 
ther work demonstrates that most significant link-pattern 
communities are revealed in regions with low values of 
clustering coefficients [63164] , while just the opposite holds 
for classical modules. Furthermore, link-pattern communi- 
ties may be the origin of degree disassortativity observed 
in various real- world networks [42 43] , while they also com- 
monly contradict the small- world phenomena |63j . Hence, 
different network properties seem to be governed by the 
same underlying principle fff, which represents a promi- 
nent direction for future research. 

This work has been supported by the Slovene Research Agency 
ARRS within Research Program No. P2-0359. 
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